In [1]:

    
import graphlab as gl
gl.canvas.set_target("ipynb")

Curating a data set



In [2]:

    
# Download and parse
ratings = gl.SFrame.read_csv('ml-1m/ratings.dat', delimiter='::', header=False)
items = gl.SFrame.read_csv('ml-1m/movies.dat', delimiter='::', header=False)

# Rename columns
ratings = ratings.rename({'X1': 'user_id', 'X2': 'item_id', 'X3': 'score', 'X4': 'timestamp'})
items = items.rename({'X1': 'item_id', 'X2': 'title_year', 'X3': 'genres'})









    



2016-03-28 18:34:07,982 [INFO] graphlab.cython.cy_server, 176: GraphLab Create v1.9 started. Logging: /tmp/graphlab_server_1459215246.log






    




Finished parsing file /Users/chris/tutorials/strata-sj-2016/recommendation-systems/ml-1m/ratings.dat






    




Parsing completed. Parsed 100 lines in 0.598555 secs.






    



This commercial license of GraphLab Create is assigned to engr@turi.com.
------------------------------------------------------





    




Finished parsing file /Users/chris/tutorials/strata-sj-2016/recommendation-systems/ml-1m/ratings.dat






    




Parsing completed. Parsed 1000209 lines in 0.67184 secs.






    




Finished parsing file /Users/chris/tutorials/strata-sj-2016/recommendation-systems/ml-1m/movies.dat






    




Parsing completed. Parsed 100 lines in 0.016913 secs.






    



Inferred types from first line of file as 
column_type_hints=[int,int,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
------------------------------------------------------





    




Finished parsing file /Users/chris/tutorials/strata-sj-2016/recommendation-systems/ml-1m/movies.dat






    




Parsing completed. Parsed 3883 lines in 0.013042 secs.






    



Inferred types from first line of file as 
column_type_hints=[int,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------



In [3]:

    
ratings









    Out[3]:





    
        user_id
        item_id
        score
        timestamp
    
    
        1
        1193
        5
        978300760
    
    
        1
        661
        3
        978302109
    
    
        1
        914
        3
        978301968
    
    
        1
        3408
        4
        978300275
    
    
        1
        2355
        5
        978824291
    
    
        1
        1197
        3
        978302268
    
    
        1
        1287
        5
        978302039
    
    
        1
        2804
        5
        978300719
    
    
        1
        594
        4
        978302268
    
    
        1
        919
        4
        978301368
    

[1000209 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [4]:

    
items









    Out[4]:





    
        item_id
        title_year
        genres
    
    
        1
        Toy Story (1995)
        Animation|Children's|Come
dy ...
    
    
        2
        Jumanji (1995)
        Adventure|Children's|Fant
asy ...
    
    
        3
        Grumpier Old Men (1995)
        Comedy|Romance
    
    
        4
        Waiting to Exhale (1995)
        Comedy|Drama
    
    
        5
        Father of the Bride Part
II (1995) ...
        Comedy
    
    
        6
        Heat (1995)
        Action|Crime|Thriller
    
    
        7
        Sabrina (1995)
        Comedy|Romance
    
    
        8
        Tom and Huck (1995)
        Adventure|Children's
    
    
        9
        Sudden Death (1995)
        Action
    
    
        10
        GoldenEye (1995)
        Action|Adventure|Thriller
    

[3883 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [5]:

    
items.show()

Data carpentry

Get year, title, and genres for each item



In [6]:

    
items['title'] = items['title_year'].apply(lambda x: x[:-7])
items['title'] = items['title'].apply(lambda x: x.decode('iso8859').encode('utf-8'))
items['year'] = items['title_year'].apply(lambda x: x[-5:-1])
items['genres'] = items['genres'].apply(lambda x: x.split('|'))
del items['title_year']



In [7]:

    
items









    Out[7]:





    
        item_id
        genres
        title
        year
    
    
        1
        [Animation, Children's,
Comedy] ...
        Toy Story
        1995
    
    
        2
        [Adventure, Children's,
Fantasy] ...
        Jumanji
        1995
    
    
        3
        [Comedy, Romance]
        Grumpier Old Men
        1995
    
    
        4
        [Comedy, Drama]
        Waiting to Exhale
        1995
    
    
        5
        [Comedy]
        Father of the Bride Part
II ...
        1995
    
    
        6
        [Action, Crime, Thriller]
        Heat
        1995
    
    
        7
        [Comedy, Romance]
        Sabrina
        1995
    
    
        8
        [Adventure, Children's]
        Tom and Huck
        1995
    
    
        9
        [Action]
        Sudden Death
        1995
    
    
        10
        [Action, Adventure,
Thriller] ...
        GoldenEye
        1995
    

[3883 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

How many unique users do we have?



In [8]:

    
ratings['user_id'].unique().size()









    Out[8]:





6040



In [9]:

    
items.show()

Create two datasets for training models



In [10]:

    
explicit = ratings[['user_id', 'item_id', 'score']]
explicit









    Out[10]:





    
        user_id
        item_id
        score
    
    
        1
        1193
        5
    
    
        1
        661
        3
    
    
        1
        914
        3
    
    
        1
        3408
        4
    
    
        1
        2355
        5
    
    
        1
        1197
        3
    
    
        1
        1287
        5
    
    
        1
        2804
        5
    
    
        1
        594
        4
    
    
        1
        919
        4
    

[1000209 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [12]:

    
implicit = explicit[explicit['score'] >= 4.0][['user_id', 'item_id']]
implicit









    Out[12]:





    
        user_id
        item_id
    
    
        1
        1193
    
    
        1
        3408
    
    
        1
        2355
    
    
        1
        1287
    
    
        1
        2804
    
    
        1
        594
    
    
        1
        919
    
    
        1
        595
    
    
        1
        938
    
    
        1
        2398
    

[? rows x 2 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.

Building a model for recommendations



In [15]:

    
m = gl.recommender.create(implicit, 'user_id', 'item_id')









    




Recsys training: model = item_similarity






    




Preparing data set.






    




    Data has 575281 observations with 6038 users and 3533 items.






    




    Data prepared in: 0.526861s






    




Computing item similarity statistics:






    




Computing most similar items for 3533 items:






    




+-----------------+-----------------+






    




| Number of items | Elapsed Time    |






    




+-----------------+-----------------+






    




| 1000            | 0.848706        |






    




| 2000            | 0.939294        |






    




| 3000            | 1.03531         |






    




+-----------------+-----------------+






    




Finished training in 1.26489s

The above model trained an item_similarity model. This computed Jaccard similarities between the items in this dataset, then for each item it ranks the top 100 most similar items, storing these so they can be used at prediction time. For more information on how this model works, see the API reference.

Get a summary of the model



In [16]:

    
m









    Out[16]:





Class                           : ItemSimilarityRecommender

Schema
------
User ID                         : user_id
Item ID                         : item_id
Target                          : None
Additional observation features : 0
Number of user side features    : 0
Number of item side features    : 0

Statistics
----------
Number of observations          : 575281
Number of users                 : 6038
Number of items                 : 3533

Training summary
----------------
Training time                   : 1.265

Model Parameters
----------------
Model class                     : ItemSimilarityRecommender
only_top_k                      : 100
threshold                       : 0.001
similarity_type                 : jaccard
training_method                 : auto

Getting similar items



In [43]:

    
items[items['item_id'] == 1287]









    Out[43]:





    
        item_id
        genres
        title
        year
    
    
        1287
        [Action, Adventure,
Drama] ...
        Ben-Hur
        1959
    

[? rows x 4 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.



In [44]:

    
m.get_similar_items([1287], k=5)









    




Getting similar items completed in 0.002333






    Out[44]:





    
        item_id
        similar
        score
        rank
    
    
        1287
        1262
        0.240425531915
        1
    
    
        1287
        2944
        0.240318906606
        2
    
    
        1287
        1954
        0.237877401647
        3
    
    
        1287
        2947
        0.235412474849
        4
    
    
        1287
        1201
        0.226069246436
        5
    

[5 rows x 4 columns]



In [45]:

    
m.get_similar_items([1287]).join(items, on={'similar': 'item_id'}).sort('rank')









    




Getting similar items completed in 0.002062






    Out[45]:





    
        item_id
        similar
        score
        rank
        genres
        title
        year
    
    
        1287
        1262
        0.240425531915
        1
        [Adventure, War]
        Great Escape, The
        1963
    
    
        1287
        2944
        0.240318906606
        2
        [Action, War]
        Dirty Dozen, The
        1967
    
    
        1287
        1954
        0.237877401647
        3
        [Action, Drama]
        Rocky
        1976
    
    
        1287
        2947
        0.235412474849
        4
        [Action]
        Goldfinger
        1964
    
    
        1287
        1201
        0.226069246436
        5
        [Action, Western]
        Good, The Bad and The
Ugly, The ...
        1966
    
    
        1287
        1204
        0.225868725869
        6
        [Adventure, War]
        Lawrence of Arabia
        1962
    
    
        1287
        1953
        0.224103585657
        7
        [Action, Crime, Drama,
Thriller] ...
        French Connection, The
        1971
    
    
        1287
        1250
        0.222123893805
        8
        [Drama, War]
        Bridge on the River Kwai,
The ...
        1957
    
    
        1287
        969
        0.217721518987
        9
        [Action, Adventure,
Romance, War] ...
        African Queen, The
        1951
    
    
        1287
        2949
        0.215827338129
        10
        [Action]
        Dr. No
        1962
    

[10 rows x 7 columns]

Build a model for predicting predicted score



In [46]:

    
m2 = gl.recommender.create(explicit, 'user_id', 'item_id', target='score')









    




Recsys training: model = ranking_factorization_recommender






    




Preparing data set.






    




    Data has 1000209 observations with 6040 users and 3706 items.






    




    Data prepared in: 0.96017s






    




Training ranking_factorization_recommender for recommendations.






    




+--------------------------------+--------------------------------------------------+----------+






    




| Parameter                      | Description                                      | Value    |






    




+--------------------------------+--------------------------------------------------+----------+






    




| num_factors                    | Factor Dimension                                 | 32       |






    




| regularization                 | L2 Regularization on Factors                     | 1e-09    |






    




| solver                         | Solver used for training                         | sgd      |






    




| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |






    




| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |






    




| max_iterations                 | Maximum Number of Iterations                     | 25       |






    




+--------------------------------+--------------------------------------------------+----------+






    




  Optimizing model using SGD; tuning step size.






    




  Using 125026 / 1000209 points for tuning the step size.






    




+---------+-------------------+------------------------------------------+






    




| Attempt | Initial Step Size | Estimated Objective Value                |






    




+---------+-------------------+------------------------------------------+






    




| 0       | 25                | Not Viable                               |






    




| 1       | 6.25              | Not Viable                               |






    




| 2       | 1.5625            | Not Viable                               |






    




| 3       | 0.390625          | Not Viable                               |






    




| 4       | 0.0976562         | 1.75064                                  |






    




| 5       | 0.0488281         | 1.83661                                  |






    




| 6       | 0.0244141         | 1.83512                                  |






    




| 7       | 0.012207          | 1.85294                                  |






    




+---------+-------------------+------------------------------------------+






    




| Final   | 0.0976562         | 1.75064                                  |






    




+---------+-------------------+------------------------------------------+






    




Starting Optimization.






    




+---------+--------------+-------------------+-----------------------+-------------+






    




| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |






    




+---------+--------------+-------------------+-----------------------+-------------+






    




| Initial | 105us        | 2.44674           | 1.1171                |             |






    




+---------+--------------+-------------------+-----------------------+-------------+






    




| 1       | 1.31s        | DIVERGED          | DIVERGED              | 0.0976562   |






    




| RESET   | 1.78s        | 2.44674           | 1.1171                |             |






    




| 1       | 2.67s        | 1.71749           | 1.04023               | 0.0488281   |






    




| 2       | 3.54s        | 1.53163           | 0.991377              | 0.0290334   |






    




| 3       | 4.41s        | 1.42213           | 0.947482              | 0.0214205   |






    




| 4       | 5.28s        | 1.34409           | 0.920136              | 0.0172633   |






    




| 5       | 6.14s        | 1.28649           | 0.896133              | 0.014603    |






    




| 6       | 6.98s        | 1.24887           | 0.881142              | 0.0127367   |






    




| 9       | 9.62s        | 1.18429           | 0.854512              | 0.00939698  |






    




| 11      | 11.44s       | 1.16188           | 0.844663              | 0.00808399  |






    




| 14      | 14.21s       | 1.13836           | 0.834458              | 0.00674643  |






    




| 19      | 18.68s       | 1.11396           | 0.823934              | 0.00536543  |






    




| 24      | 23.10s       | 1.0989            | 0.817241              | 0.0045031   |






    




+---------+--------------+-------------------+-----------------------+-------------+






    




Optimization Complete: Maximum number of passes through the data reached.






    




Computing final objective value and training RMSE.






    




       Final objective value: 1.09867






    




       Final training RMSE: 0.788257

Making batch recommendations



In [47]:

    
recs = m.recommend()









    




recommendations finished on 1000/6038 queries. users per second: 1132.35






    




recommendations finished on 2000/6038 queries. users per second: 1043.33






    




recommendations finished on 3000/6038 queries. users per second: 1055.7






    




recommendations finished on 4000/6038 queries. users per second: 1050.06






    




recommendations finished on 5000/6038 queries. users per second: 1013.33






    




recommendations finished on 6000/6038 queries. users per second: 1033.92



In [48]:

    
recs









    Out[48]:





    
        user_id
        item_id
        score
        rank
    
    
        1
        1198
        0.154503379509
        1
    
    
        1
        1196
        0.153149444095
        2
    
    
        1
        318
        0.152713126805
        3
    
    
        1
        1307
        0.144081292055
        4
    
    
        1
        593
        0.136649332825
        5
    
    
        1
        1197
        0.134134392698
        6
    
    
        1
        1265
        0.133883400821
        7
    
    
        1
        296
        0.13312830713
        8
    
    
        1
        1291
        0.132043287356
        9
    
    
        1
        457
        0.131461835568
        10
    

[60380 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [49]:

    
ratings[ratings['user_id'] == 4].join(items, on='item_id')









    Out[49]:





    
        user_id
        item_id
        score
        timestamp
        genres
        title
        year
    
    
        4
        260
        5
        978294199
        [Action, Adventure,
Fantasy, Sci-Fi] ...
        Star Wars: Episode IV - A
New Hope ...
        1977
    
    
        4
        480
        4
        978294008
        [Action, Adventure, Sci-
Fi] ...
        Jurassic Park
        1993
    
    
        4
        1036
        4
        978294282
        [Action, Thriller]
        Die Hard
        1988
    
    
        4
        1097
        4
        978293964
        [Children's, Drama,
Fantasy, Sci-Fi] ...
        E.T. the Extra-
Terrestrial ...
        1982
    
    
        4
        1196
        2
        978294199
        [Action, Adventure,
Drama, Sci-Fi, War] ...
        Star Wars: Episode V -
The Empire Strikes Back ...
        1980
    
    
        4
        1198
        5
        978294199
        [Action, Adventure]
        Raiders of the Lost Ark
        1981
    
    
        4
        1201
        5
        978294230
        [Action, Western]
        Good, The Bad and The
Ugly, The ...
        1966
    
    
        4
        1210
        3
        978293924
        [Action, Adventure,
Romance, Sci-Fi, War] ...
        Star Wars: Episode VI -
Return of the Jedi ...
        1983
    
    
        4
        1214
        4
        978294260
        [Action, Horror, Sci-Fi,
Thriller] ...
        Alien
        1979
    
    
        4
        1240
        5
        978294260
        [Action, Sci-Fi,
Thriller] ...
        Terminator, The
        1984
    

[21 rows x 7 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [50]:

    
m.recommend(users=[4], k=20).join(items, on='item_id').sort('rank')









    Out[50]:





    
        user_id
        item_id
        score
        rank
        genres
        title
        year
    
    
        4
        1196
        0.272550089987
        1
        [Action, Adventure,
Drama, Sci-Fi, War] ...
        Star Wars: Episode V -
The Empire Strikes Back ...
        1980
    
    
        4
        1200
        0.262437272494
        2
        [Action, Sci-Fi,
Thriller, War] ...
        Aliens
        1986
    
    
        4
        1291
        0.251093999068
        3
        [Action, Adventure]
        Indiana Jones and the
Last Crusade ...
        1989
    
    
        4
        589
        0.247480019634
        4
        [Action, Sci-Fi,
Thriller] ...
        Terminator 2: Judgment
Day ...
        1991
    
    
        4
        2571
        0.245820147055
        5
        [Action, Sci-Fi,
Thriller] ...
        Matrix, The
        1999
    
    
        4
        858
        0.243937367269
        6
        [Action, Crime, Drama]
        Godfather, The
        1972
    
    
        4
        457
        0.235640202322
        7
        [Action, Thriller]
        Fugitive, The
        1993
    
    
        4
        1221
        0.233340571898
        8
        [Action, Crime, Drama]
        Godfather: Part II, The
        1974
    
    
        4
        1610
        0.22213252277
        9
        [Action, Thriller]
        Hunt for Red October, The
        1990
    
    
        4
        1210
        0.220900300697
        10
        [Action, Adventure,
Romance, Sci-Fi, War] ...
        Star Wars: Episode VI -
Return of the Jedi ...
        1983
    

[20 rows x 7 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [51]:

    
m.recommend?

Recommendations for new users



In [53]:

    
recent_data = gl.SFrame()
recent_data['item_id'] = [1291]   # Indiana Jones and the Last Crusade
recent_data['user_id'] = 99999



In [54]:

    
m.recommend(users=[99999], new_observation_data=recent_data).join(items, on='item_id').sort('rank')









    Out[54]:





    
        user_id
        item_id
        score
        rank
        genres
        title
        year
    
    
        99999
        1198
        0.475933609959
        1
        [Action, Adventure]
        Raiders of the Lost Ark
        1981
    
    
        99999
        1036
        0.415724286484
        2
        [Action, Thriller]
        Die Hard
        1988
    
    
        99999
        1210
        0.390739236393
        3
        [Action, Adventure,
Romance, Sci-Fi, War] ...
        Star Wars: Episode VI -
Return of the Jedi ...
        1983
    
    
        99999
        1196
        0.390430971512
        4
        [Action, Adventure,
Drama, Sci-Fi, War] ...
        Star Wars: Episode V -
The Empire Strikes Back ...
        1980
    
    
        99999
        1240
        0.368227731864
        5
        [Action, Sci-Fi,
Thriller] ...
        Terminator, The
        1984
    
    
        99999
        260
        0.362182829336
        6
        [Action, Adventure,
Fantasy, Sci-Fi] ...
        Star Wars: Episode IV - A
New Hope ...
        1977
    
    
        99999
        592
        0.356594110115
        7
        [Action, Adventure,
Crime, Drama] ...
        Batman
        1989
    
    
        99999
        2115
        0.352819807428
        8
        [Action, Adventure]
        Indiana Jones and the
Temple of Doom ...
        1984
    
    
        99999
        2000
        0.345524017467
        9
        [Action, Comedy, Crime,
Drama] ...
        Lethal Weapon
        1987
    
    
        99999
        1197
        0.34544695071
        10
        [Action, Adventure,
Comedy, Romance] ...
        Princess Bride, The
        1987
    

[10 rows x 7 columns]

Saving and loading models and data



In [55]:

    
m.save('my_model')



In [56]:

    
m_again = gl.load_model('my_model')



In [57]:

    
m_again









    Out[57]:





Class                           : ItemSimilarityRecommender

Schema
------
User ID                         : user_id
Item ID                         : item_id
Target                          : None
Additional observation features : 0
Number of user side features    : 0
Number of item side features    : 0

Statistics
----------
Number of observations          : 575281
Number of users                 : 6038
Number of items                 : 3533

Training summary
----------------
Training time                   : 1.1486

Model Parameters
----------------
Model class                     : ItemSimilarityRecommender
only_top_k                      : 100
threshold                       : 0.001
similarity_type                 : jaccard
training_method                 : auto



In [58]:

    
items.save('items')
ratings.save('ratings')
explicit.save('explicit')
implicit.save('implicit')









    




Getting similar items completed in 0.002338



In [ ]:

user_id	item_id	score	timestamp
1	1193	5	978300760
1	661	3	978302109
1	914	3	978301968
1	3408	4	978300275
1	2355	5	978824291
1	1197	3	978302268
1	1287	5	978302039
1	2804	5	978300719
1	594	4	978302268
1	919	4	978301368

item_id	title_year	genres
1	Toy Story (1995)	Animation\|Children's\|Come dy ...
2	Jumanji (1995)	Adventure\|Children's\|Fant asy ...
3	Grumpier Old Men (1995)	Comedy\|Romance
4	Waiting to Exhale (1995)	Comedy\|Drama
5	Father of the Bride Part II (1995) ...	Comedy
6	Heat (1995)	Action\|Crime\|Thriller
7	Sabrina (1995)	Comedy\|Romance
8	Tom and Huck (1995)	Adventure\|Children's
9	Sudden Death (1995)	Action
10	GoldenEye (1995)	Action\|Adventure\|Thriller

item_id	genres	title	year
1	[Animation, Children's, Comedy] ...	Toy Story	1995
2	[Adventure, Children's, Fantasy] ...	Jumanji	1995
3	[Comedy, Romance]	Grumpier Old Men	1995
4	[Comedy, Drama]	Waiting to Exhale	1995
5	[Comedy]	Father of the Bride Part II ...	1995
6	[Action, Crime, Thriller]	Heat	1995
7	[Comedy, Romance]	Sabrina	1995
8	[Adventure, Children's]	Tom and Huck	1995
9	[Action]	Sudden Death	1995
10	[Action, Adventure, Thriller] ...	GoldenEye	1995

item_id	similar	score	rank
1287	1262	0.240425531915	1
1287	2944	0.240318906606	2
1287	1954	0.237877401647	3
1287	2947	0.235412474849	4
1287	1201	0.226069246436	5

user_id	item_id	score	rank
1	1198	0.154503379509	1
1	1196	0.153149444095	2
1	318	0.152713126805	3
1	1307	0.144081292055	4
1	593	0.136649332825	5
1	1197	0.134134392698	6
1	1265	0.133883400821	7
1	296	0.13312830713	8
1	1291	0.132043287356	9
1	457	0.131461835568	10

user_id	item_id	score	timestamp	genres	title	year
4	260	5	978294199	[Action, Adventure, Fantasy, Sci-Fi] ...	Star Wars: Episode IV - A New Hope ...	1977
4	480	4	978294008	[Action, Adventure, Sci- Fi] ...	Jurassic Park	1993
4	1036	4	978294282	[Action, Thriller]	Die Hard	1988
4	1097	4	978293964	[Children's, Drama, Fantasy, Sci-Fi] ...	E.T. the Extra- Terrestrial ...	1982
4	1196	2	978294199	[Action, Adventure, Drama, Sci-Fi, War] ...	Star Wars: Episode V - The Empire Strikes Back ...	1980
4	1198	5	978294199	[Action, Adventure]	Raiders of the Lost Ark	1981
4	1201	5	978294230	[Action, Western]	Good, The Bad and The Ugly, The ...	1966
4	1210	3	978293924	[Action, Adventure, Romance, Sci-Fi, War] ...	Star Wars: Episode VI - Return of the Jedi ...	1983
4	1214	4	978294260	[Action, Horror, Sci-Fi, Thriller] ...	Alien	1979
4	1240	5	978294260	[Action, Sci-Fi, Thriller] ...	Terminator, The	1984

user_id	item_id	score	rank	genres	title	year
4	1196	0.272550089987	1	[Action, Adventure, Drama, Sci-Fi, War] ...	Star Wars: Episode V - The Empire Strikes Back ...	1980
4	1200	0.262437272494	2	[Action, Sci-Fi, Thriller, War] ...	Aliens	1986
4	1291	0.251093999068	3	[Action, Adventure]	Indiana Jones and the Last Crusade ...	1989
4	589	0.247480019634	4	[Action, Sci-Fi, Thriller] ...	Terminator 2: Judgment Day ...	1991
4	2571	0.245820147055	5	[Action, Sci-Fi, Thriller] ...	Matrix, The	1999
4	858	0.243937367269	6	[Action, Crime, Drama]	Godfather, The	1972
4	457	0.235640202322	7	[Action, Thriller]	Fugitive, The	1993
4	1221	0.233340571898	8	[Action, Crime, Drama]	Godfather: Part II, The	1974
4	1610	0.22213252277	9	[Action, Thriller]	Hunt for Red October, The	1990
4	1210	0.220900300697	10	[Action, Adventure, Romance, Sci-Fi, War] ...	Star Wars: Episode VI - Return of the Jedi ...	1983

user_id	item_id	score	rank	genres	title	year
99999	1198	0.475933609959	1	[Action, Adventure]	Raiders of the Lost Ark	1981
99999	1036	0.415724286484	2	[Action, Thriller]	Die Hard	1988
99999	1210	0.390739236393	3	[Action, Adventure, Romance, Sci-Fi, War] ...	Star Wars: Episode VI - Return of the Jedi ...	1983
99999	1196	0.390430971512	4	[Action, Adventure, Drama, Sci-Fi, War] ...	Star Wars: Episode V - The Empire Strikes Back ...	1980
99999	1240	0.368227731864	5	[Action, Sci-Fi, Thriller] ...	Terminator, The	1984
99999	260	0.362182829336	6	[Action, Adventure, Fantasy, Sci-Fi] ...	Star Wars: Episode IV - A New Hope ...	1977
99999	592	0.356594110115	7	[Action, Adventure, Crime, Drama] ...	Batman	1989
99999	2115	0.352819807428	8	[Action, Adventure]	Indiana Jones and the Temple of Doom ...	1984
99999	2000	0.345524017467	9	[Action, Comedy, Crime, Drama] ...	Lethal Weapon	1987
99999	1197	0.34544695071	10	[Action, Adventure, Comedy, Romance] ...	Princess Bride, The	1987